1 Programming for Data Science : Melbourne’s Real Estate

library(knitr)
knitr::include_graphics("data_input/image.jpg")

Img Source : https://i2.au.reastatic.net/800x600/4a991d2ad68caac22534dd9368493089cb544fac2efe20a7e8050bdc8ee2aedd/image.jpg

link: https://www.kaggle.com/datasets/dansbecker/melbourne-housing-snapshot

Melbourne, positioned in the southeastern part of Australia, is home to a dynamic real estate scene that mirrors the city’s diverse and evolving character. From the historic charm of Victorian-era homes nestled in inner suburbs to the modern skyline dominating the central business district, Melbourne’s real estate is a tangible reflection of its rich history and contemporary lifestyle. As an integral aspect of urban living, real estate in Melbourne serves as a lens through which one can observe the city’s economic vitality and changing societal preferences. The real estate market in Melbourne is not just about properties; it’s a narrative of the city’s past, present, and future.

This data was scraped from publicly available results posted every week from https://Domain.com.au. The dataset includes Address, Type of Real estate, Suburb, Method of Selling, Rooms, Price, Real Estate Agent, Date of Sale and distance from C.B.D.

1.1 Notes on Specific Variables

  • Rooms: Number of rooms

  • Price: Price in dollars

  • Method: S - property sold; SP - property sold prior; PI - property passed in; PN - sold prior not disclosed; SN - sold not disclosed; NB - no bid; VB - vendor bid; W - withdrawn prior to auction; SA - sold after auction; SS - sold after auction price not disclosed. N/A - price or highest bid not available.

  • Type: br - bedroom(s); h - house,cottage,villa, semi,terrace; u - unit, duplex; t - townhouse; dev site - development site; o res - other residential.

  • SellerG: Real Estate Agent

  • Date: Date sold

  • Distance: Distance from CBD

  • Regionname: General Region (West, North West, North, North east …etc)

  • Propertycount: Number of properties that exist in the suburb.

  • Bedroom2 : Scraped of Bedrooms (from different source)

  • Bathroom: Number of Bathrooms

  • Car: Number of carspots

  • Landsize: Land Size

  • BuildingArea: Building Size

  • CouncilArea: Governing council for the area

The First step is inserting the csv file into R located in data_input and then installing the necessary plugins including dplyr, ggplot2, and leaflet that will be used.

#Read data csv
melb <- read.csv("data_input/melb_data.csv")

library(dplyr) #install package dplyr to manipulate character data
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2) #install package ggplot2 for data visualization
## Warning: package 'ggplot2' was built under R version 4.3.2
library(leaflet) #install package leaflet for map plotting

2 Data Preparation

Next, we will inspect the ‘melb’ folder and then remove duplicate rows based on the ‘Address’ column. Additionally, we will eliminate any unnecessary columns.

#First, we read the .csv file and use only the relevant columns
melb
#Clean duplicates based on address
melb <- melb[!duplicated(melb$Address),]
# Removing unnecessary columns using select()
melb2 <- melb[,!names(melb) %in% c("Address", "Postcode", "Propertycount", "CouncilArea", "YearBuilt", "BuildingArea")]
# Removing NA Rows and "0"
melb2 <- melb2[melb2$Landsize != 0, ]

After removing the unnecessary rows and columns, we will proceed to convert the data types to facilitate plotting. The firt column we convert is melb$Date using the “%d/%m/%Y”, and the rest as Factor and integer.

# Converting Date datatype
#Converting Date
melb2$Date <- as.Date(melb2$Date, format = "%d/%m/%Y") 

#Converting as Factor & Integer
mutate(.data = melb2,
       Suburb = as.factor(Suburb),
       Method = as.factor(Method),
       Type = as.factor(Type),
       Regionname = as.factor(Regionname),
       Bedroom2 = as.integer(Bedroom2),
       Bathroom = as.integer(Bathroom),
       Car = as.integer(Car),
       Landsize = as.integer(Landsize)
       )

Next, we will rename the categories in the ‘Regionname’ and ‘Type’ columns to make them easier to read and eliminate any unnecessary duplicates.

#Renaming Categories for Regionname
melb2 <- melb2 %>%
  mutate(Regionname = recode(Regionname, "Northern Victoria" = "Northern Metropolitan"), 
        Regionname = recode(Regionname, "Eastern Victoria" = "Eastern Metropolitan"),
        Regionname = recode(Regionname, "Western Victoria" = "Western Metropolitan"))

#Renaming Categories for Type
melb2 <- melb2 %>%
  mutate(Type = recode(Type, "h" = "Housing"), 
        Type = recode(Type, "u" = "Unit"),
        Type = recode(Type, "t" = "Townhouse"))

3 Data Explortaion

Calculate the price per square meter for each property in the dataset and then removes any outliers (values outside the 1.5*IQR range) from the ‘PricePerSQM’ column, resulting in a dataset with potentially more reliable data for analysis or visualization.

# Creating Price Per Squaremeter Column
melb2$PricePerSQM <- (melb2$Price / melb2$Landsize)

#Removing outlier
Q1 <- quantile(melb2$PricePerSQM, 0.25)
Q3 <- quantile(melb2$PricePerSQM, 0.75)
IQR <- Q3 - Q1
LB <- Q1 - 1.5*IQR
UB <- Q3 + 1.5*IQR

melb2 <- subset(melb2, PricePerSQM >= LB & PricePerSQM <= UB)

According to the histogram below, the most frequent number of rooms is 3.

#Number of Rooms in Building
hist(melb2$Rooms, col = c('lightblue'), main = "Number of Rooms", xlab = "Rooms", density = 50)

#mean(melb2$Rooms) # Center of Data = 2.977419 (Rooms)

Based on the histogram below, the most frequently occurring distance to the CBD falls within the range of 5 to 15, with a median distance of 10.1.

#Distance
hist(melb2$Distance, col = c('orange'), main = "Melbourne Housing Distance", xlab = "Distance", density = 50)

#median(melb2$Distance) # Center of Data = 10.1 (Distance)

According to the table below, the suburb with the highest count of property sold is Reservoir with a total 160 properties.

#Top 10 suburb with the highest number of sold properties

freq_suburb <- table(melb2$Suburb) # Create frequency table
freq_suburb_df <- as.data.frame(freq_suburb) # Create data frame
colnames(freq_suburb_df) <- c("Suburb", "PropertiesSold") # Renaming columns
freq_suburb_df <- freq_suburb_df[order(-freq_suburb_df$PropertiesSold),]
top10_freq_suburb <- freq_suburb_df[1:10, ]
top10_freq_suburb

According to the table below, the suburb with the highest average price per SQM is Princes Hill with a price of $7453.

# Top 10 Suburb with highest average of buying price per square meter
average_price <- melb2 %>%
  group_by(Suburb) %>%
  summarise(Average_Price_Per_Square_Meter = median(PricePerSQM, na.rm = TRUE)) # Creating a table
average_price <- average_price[order(-average_price$Average_Price_Per_Square_Meter),] # Sorting order
colnames(average_price) <- c("Suburb", "Average price Per SQM") # Renaming columns
top10_freq_suburb2 <- average_price[1:10, ]
top10_freq_suburb2
boxplot(melb2$PricePerSQM~ melb2$Type,horizontal = F, col = c('orange'), main = "Melbourne Housing Price & Type", xlab = "Type", ylab = "Price Per SQM")

# h - house,cottage,villa, semi,terrace
# u - unit, duplex
# t - townhouse; dev site - development site

The boxplot indicates that the type of housing that has the highest median price per SQM is a Townhouse type, and then Housing, with a close third Unit type.

3.1 Relations between Price and Distance

#Plotting Price PerSQM and Distance to CBD
#cor(melb2$Distance, melb2$PricePerSQM) = -0.4333912
ggplot(data = melb2, aes(x = Distance, y = PricePerSQM, color = Type)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  labs(
    title = "Relations between Price and Distance", subtitle = "Melbourne Housing",
    x = "Distance",
    y = "Price per SQM"
  ) +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

The plot illustrates a correlation between house distance and price per square meter. A pronounced negative correlation is observed for the Housing Type, indicating that shorter distances correspond to higher prices per square meter. The Townhouse type also exhibits a negative correlation, although it is not as robust as the Housing Type. Finally, the Unit type shows a positive correlation, suggesting that longer distances are generally associated with higher prices per square meter.

3.2 Melbourne Housing Rooms & Price

#Plotting Price PerSQM and number of rooms
boxplot(melb2$PricePerSQM~ melb2$Rooms,horizontal = F, col = c('orange'), main = "Melbourne Housing Rooms & Price", xlab = "Rooms", ylab = "Price Per SQM")

The data indicating that housing units with two rooms command the highest median price per square meter suggests an intriguing trend in the real estate market. This finding prompts a deeper exploration into the factors influencing the perceived value of properties with a specific room count.

3.3 Melbourne Housing Bedroom & Price

#Plotting Price PerSQM and number of bedroom
boxplot(melb2$PricePerSQM~ melb2$Bedroom2,horizontal = F, col = c('orange'), main = "Melbourne Housing Bedroom & Price", xlab = "Bedroom", ylab = "Price Per SQM")

The observation that housing units with two bedrooms command the highest median price per square meter unveils a significant trend in the real estate market. This finding prompts a closer examination of the factors contributing to the premium associated with two-bedroom units. It could be an indicative of possibly catering to young professionals, couples, or small families seeking functional yet compact living arrangements. While the single bedroom has the lowest price among all of them.

3.4 Melbourne Housing Bathroom & Price

#Plotting Price PerSQM and number of bathroom
boxplot(melb2$PricePerSQM~ melb2$Bathroom,horizontal = F, col = c('orange'), main = "Melbourne Housing Bathroom & Price", xlab = "Bathroom", ylab = "Price Per SQM")

The data reveals that housing units with four bathrooms exhibit the highest median price per square meter suggests a correlation between bathroom quantity and perceived property value, potentially indicating a market segment placing premium importance on luxurious amenities.However, the price starts to drop-off after it reached the number 6 where maximum is close to the IQR of the box.

3.5 Melbourne Housing Region & Price

#Plotting Price PerSQM and region

ggplot(data = melb2, aes(x = Regionname, y = PricePerSQM, color = Type)) +
  geom_col(mapping = aes(fill=Type), position = "dodge") +
   theme(
    text = element_text(size = 7),# Adjust the font size
    panel.background = element_rect(fill="white")
  )

According to the barplot , it is observed that housing units located in the Southern Metropolitan region have the highest average price compared to other regions, while the South-Eastern region has the lowest average price. Typically, the housing type with the highest cost is Housing, followed by Townhouse, and then Unit.

3.6 Map of Housing Sold in Northern Metropolitan - Melbourne

# Map
melbmap <- subset(melb2, melb2$Regionname == "Northern Metropolitan")
melbmap <- data.frame(lat=melbmap$Lattitude, # latitude
                   lng=melbmap$Longtitude) # longitude
set.seed(50)
melbmap <- melbmap[sample(nrow(melbmap), 25), ]
# get icon
ico <- makeIcon(
    iconUrl = "https://cdn-icons-png.flaticon.com/512/25/25694.png",
    iconWidth=41/2,
    iconHeight=41/2
)


# create a leaflet map widget
map <- leaflet()
# add tiles from open street map
map <- addTiles(map = map)
# add markers
addMarkers(map = map, data = melbmap, icon = ico)

4 Conclusion

📌 In Melbourne, some of the most expensive houses often feature two roomy bedrooms and an impressive four bathrooms. Princes Hill, located in the Northern Metropolitan area, is one place where you can find such luxurious properties.

DOT POINTS

  1. The closer you are to the CBD, the higher the price

  2. The most sold properties is located at Reservoir Suburb

  3. Highest housing price in general:

  • Highest price of housing is located at Princes Hill which is part of Northern Metropolitan
  • Highest price of housing type is a townhouse
  • 2 Rooms
  • 2 Bedroom
  • 4 Bathroom